reward-free reinforcement learning
On Reward-Free Reinforcement Learning with Linear Function Approximation
Reward-free reinforcement learning (RL) is a framework which is suitable for both the batch RL setting and the setting where there are many reward functions of interest. During the exploration phase, an agent collects samples without using a pre-specified reward function. After the exploration phase, a reward function is given, and the agent uses samples collected during the exploration phase to compute a near-optimal policy. Jin et al. [2020] showed that in the tabular setting, the agent only needs to collect polynomial number of samples (in terms of the number states, the number of actions, and the planning horizon) for reward-free RL. However, in practice, the number of states and actions can be large, and thus function approximation schemes are required for generalization.
On Reward-Free Reinforcement Learning with Linear Function Approximation
Reward-free reinforcement learning (RL) is a framework which is suitable for both the batch RL setting and the setting where there are many reward functions of interest. During the exploration phase, an agent collects samples without using a pre-specified reward function. After the exploration phase, a reward function is given, and the agent uses samples collected during the exploration phase to compute a near-optimal policy. Jin et al. [2020] showed that in the tabular setting, the agent only needs to collect polynomial number of samples (in terms of the number states, the number of actions, and the planning horizon) for reward-free RL. However, in practice, the number of states and actions can be large, and thus function approximation schemes are required for generalization.
Review for NeurIPS paper: On Reward-Free Reinforcement Learning with Linear Function Approximation
I would just like to confirm my understanding of the algorithmic contributions of this work. As far as I understand, Jin et al [2019] propose a learning algorithm for the standard RL case with linear function approximation in linear MDPs. Then Jin et al [2020] propose a method for efficient exploration in the reward-free RL case. This is for normal MDPs but in the tabular setting. In that work, exploration is achieved by constructing a reward function where the reward is 1 for states that are "significant", and 0 otherwise, and then solving the resulting task with an efficient learning algorithm.
Review for NeurIPS paper: On Reward-Free Reinforcement Learning with Linear Function Approximation
The authors study sequential decision processes without reward function. The goal is to learn the transition dynamics such that various reward functions could be optimised efficiently in the future. The authors extend recent work to the linear function approximation case. They provide an analysis of the sample complexity, and show that while for linear MDPs complexity is polynomial, this is not true for MDPs with a linear optimal value functions, providing insight on the hardness of this second class of problems. The strengths of the paper are the theoretical development of the algorithm and the lower bound for MDPs with linear optimal Q functions.
On Reward-Free Reinforcement Learning with Linear Function Approximation
Reward-free reinforcement learning (RL) is a framework which is suitable for both the batch RL setting and the setting where there are many reward functions of interest. During the exploration phase, an agent collects samples without using a pre-specified reward function. After the exploration phase, a reward function is given, and the agent uses samples collected during the exploration phase to compute a near-optimal policy. Jin et al. [2020] showed that in the tabular setting, the agent only needs to collect polynomial number of samples (in terms of the number states, the number of actions, and the planning horizon) for reward-free RL. However, in practice, the number of states and actions can be large, and thus function approximation schemes are required for generalization.